41 research outputs found
Emulation of random output simulators
Computer models, or simulators, are widely used in a range of scientific fields to aid understanding of the processes involved and make predictions. Such simulators are often computationally demanding and are thus not amenable to statistical analysis. Emulators provide a statistical approximation, or surrogate, for the simulators accounting for the additional approximation uncertainty. This thesis develops a novel sequential screening method to reduce the set of simulator variables considered during emulation. This screening method is shown to require fewer simulator evaluations than existing approaches. Utilising the lower dimensional active variable set simplifies subsequent emulation analysis. For random output, or stochastic, simulators the output dispersion, and thus variance, is typically a function of the inputs. This work extends the emulator framework to account for such heteroscedasticity by constructing two new heteroscedastic Gaussian process representations and proposes an experimental design technique to optimally learn the model parameters. The design criterion is an extension of Fisher information to heteroscedastic variance models. Replicated observations are efficiently handled in both the design and model inference stages. Through a series of simulation experiments on both synthetic and real world simulators, the emulators inferred on optimal designs with replicated observations are shown to outperform equivalent models inferred on space-filling replicate-free designs in terms of both model parameter uncertainty and predictive variance
Emulation of random output simulators
Computer models, or simulators, are widely used in a range of scientific fields to aid understanding of the processes involved and make predictions. Such simulators are often computationally demanding and are thus not amenable to statistical analysis. Emulators provide a statistical approximation, or surrogate, for the simulators accounting for the additional approximation uncertainty. This thesis develops a novel sequential screening method to reduce the set of simulator variables considered during emulation. This screening method is shown to require fewer simulator evaluations than existing approaches. Utilising the lower dimensional active variable set simplifies subsequent emulation analysis. For random output, or stochastic, simulators the output dispersion, and thus variance, is typically a function of the inputs. This work extends the emulator framework to account for such heteroscedasticity by constructing two new heteroscedastic Gaussian process representations and proposes an experimental design technique to optimally learn the model parameters. The design criterion is an extension of Fisher information to heteroscedastic variance models. Replicated observations are efficiently handled in both the design and model inference stages. Through a series of simulation experiments on both synthetic and real world simulators, the emulators inferred on optimal designs with replicated observations are shown to outperform equivalent models inferred on space-filling replicate-free designs in terms of both model parameter uncertainty and predictive variance.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Managing uncertainty in complex stochastic models: design and emulation of a Rabies model
In this paper we present a novel method for emulating a stochastic, or random output, computer model and show its application to a complex rabies model. The method is evaluated both in terms of accuracy and computational efficiency on synthetic data and the rabies model. We address the issue of experimental design and provide empirical evidence on the effectiveness of utilizing replicate model evaluations compared to a space-filling design. We employ the Mahalanobis error measure to validate the heteroscedastic Gaussian process based emulator predictions for both the mean and (co)variance. The emulator allows efficient screening to identify important model inputs and better understanding of the complex behaviour of the rabies model
Approximately optimal experimental design for heteroscedastic Gaussian process models
This paper presents a greedy Bayesian experimental design criterion for heteroscedastic Gaussian process models. The criterion is based on the Fisher information and is optimal in the sense of minimizing parameter uncertainty for likelihood based estimators. We demonstrate the validity of the criterion under different noise regimes and present experimental results from a rabies simulator to demonstrate the effectiveness of the resulting approximately optimal designs
Emulation of dynamic computer models with multivariate output
This preliminary report describes work carried out as part of work package 1.2 of the MUCM research project. The report is split in two parts: the ?rst part (Sections 1 and 2) summarises the state of the art in emulation of computer models, while the second presents some initial work on the emulation of dynamic models. In the ?rst part, we describe the basics of emulation, introduce the notation and put together the key results for the emulation of models with single and multiple outputs, with or without the use of mean function. In the second part, we present preliminary results on the chaotic Lorenz 63 model. We look at emulation of a single time step, and repeated application of the emulator for sequential predic- tion. After some design considerations, the emulator is compared with the exact simulator on a number of runs to assess its performance. Several general issues related to emulating dynamic models are raised and discussed. Current work on the larger Lorenz 96 model (40 variables) is presented in the context of dimension reduction, with results to be provided in a follow-up report. The notation used in this report are summarised in appendix
Dimensionality reduction in complex models
As a part of the Managing Uncertainty in Complex Models (MUCM) project, research at Aston University will develop methods for dimensionality reduction of the input and/or output spaces of models, as seen within the emulator framework. Towards this end this report describes a framework for generating toy datasets, whose underlying structure is understood, to facilitate early investigations of dimensionality reduction methods and to gain a deeper understanding of the algorithms employed, both in terms of how effective they are for given types of models / situations, and also their speed in applications and how this scales with various factors. The framework, which allows the evaluation of both screening and projection approaches to dimensionality reduction, is described. We also describe the screening and projection methods currently under consideration and present some preliminary results. The aim of this draft of the report is to solicit feedback from the project team on the dataset generation framework, the methods we propose to use, and suggestions for extensions that should be considered
Simple approximate MAP Inference for Dirichlet processes
The Dirichlet process mixture (DPM) is a ubiquitous, flexible Bayesian
nonparametric statistical model. However, full probabilistic inference in this
model is analytically intractable, so that computationally intensive techniques
such as Gibb's sampling are required. As a result, DPM-based methods, which
have considerable potential, are restricted to applications in which
computational resources and time for inference is plentiful. For example, they
would not be practical for digital signal processing on embedded hardware,
where computational resources are at a serious premium. Here, we develop
simplified yet statistically rigorous approximate maximum a-posteriori (MAP)
inference algorithms for DPMs. This algorithm is as simple as K-means
clustering, performs in experiments as well as Gibb's sampling, while requiring
only a fraction of the computational effort. Unlike related small variance
asymptotics, our algorithm is non-degenerate and so inherits the "rich get
richer" property of the Dirichlet process. It also retains a non-degenerate
closed-form likelihood which enables standard tools such as cross-validation to
be used. This is a well-posed approximation to the MAP solution of the
probabilistic DPM model.Comment: 11 pages, 4 Figures, 5 Table